Combining Word Embedding and Lexical Database for Semantic Relatedness Measurement

نویسندگان

  • Yang-Yin Lee
  • Hao Ke
  • Hen-Hsen Huang
  • Hsin-Hsi Chen
چکیده

While many traditional studies on semantic relatedness utilize the lexical databases, such as WordNet or Wikitionary, the recent word embedding learning approaches demonstrate their abilities to capture syntactic and semantic information, and outperform the lexicon-based methods. However, word senses are not disambiguated in the training phase of both Word2Vec and GloVe, two famous word embedding algorithms, and the path length between any two senses of words in lexical databases cannot reflect their true semantic relatedness. In this paper, a novel approach that linearly combines Word2Vec and GloVe with the lexical database WordNet is proposed for measuring semantic relatedness. The experiments show that the simple method outperforms the state-of-the-art model SensEmbed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Better explanations of lexical and semantic cognition using networks derived from continued rather than single-word associations.

In this article, we describe the most extensive set of word associations collected to date. The database contains over 12,000 cue words for which more than 70,000 participants generated three responses in a multiple-response free association task. The goal of this study was (1) to create a semantic network that covers a large part of the human lexicon, (2) to investigate the implications of a m...

متن کامل

Can Network Embedding of Distributional Thesaurus be Combined with Word Vectors for Better Representation?

Distributed representations of words learned from text have proved to be successful in various natural language processing tasks in recent times. While some methods represent words as vectors computed from text using predictive model (Word2vec) or dense count based model (GloVe), others attempt to represent these in a distributional thesaurus network structure where the neighborhood of a word i...

متن کامل

Combining Word Representations for Measuring Word Relatedness and Similarity

Many unsupervised methods, such as Latent Semantic Analysis and Latent Dirichlet Allocation, have been proposed to automatically infer word representations in the form of a vector. By representing a word by a vector, one can exploit the power of vector algebra to solve many Natural Language Processing tasks e.g. by computing the cosine similarity between the corresponding word vectors the seman...

متن کامل

IndoWordnet Visualizer: A Graphical User Interface for Browsing and Exploring Wordnets of Indian Languages

In this paper, we are presenting a graphical user interface to browse and explore the IndoWordnet lexical database for various Indian languages. IndoWordnet visualizer extracts the related concepts for a given word and displays a sub graph containing those concepts. The interface is enhanced with different features in order to provide flexibility to the user. IndoWordnet visualizer is made publ...

متن کامل

Combining Heterogeneous Knowledge Resources for Improved Distributional Semantic Models

The Explicit Semantic Analysis (ESA) model based on term cooccurrences in Wikipedia has been regarded as state-of-the-art semantic relatedness measure in the recent years. We provide an analysis of the important parameters of ESA using datasets in five different languages. Additionally, we propose the use of ESA with multiple lexical semantic resources thus exploiting multiple evidence of term ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016